Boston Motor Vehicle Incidents

Introduction

In this notebook we will be analysing the Crime Incident Reports of the city of Boston, US, in the years 2016-2018. The actual dataset contains records for events from June 2015 to January, 2019. This report is provided by the Boston Police Department (BPD) "to document the initial details surrounding an incident to which BPD officers respond" Source.

Since there are more than 355k events registered in this dataset, we will only be focusing at the incidents related to motor vehicles.

What is the structure of your dataset?

Police responses are logged and reported chronologically by the Boston Police Department.

What are the main feature(s) of interest in your dataset?

We will be focusing on the "OFFENSE_CODE_GROUP" column/feature, which is the "internal categorization" of the incident.

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

The "OCCURED_ON_DATE" will be one of our most valuable assets. We will look at days, months and years. The "INCIDENT_NUMBER" will help us with counting events. "SHOOTING" indicates if a shooting took place.

In [4]:
df.head()
Out[4]:
INCIDENT_NUMBER OFFENSE_CODE OFFENSE_CODE_GROUP OFFENSE_DESCRIPTION DISTRICT REPORTING_AREA SHOOTING OCCURRED_ON_DATE YEAR MONTH DAY_OF_WEEK HOUR UCR_PART STREET Lat Long Location
0 I192004100 2907 Violations VAL - OPERATING AFTER REV/SUSP. NaN NaN 2019-01-15 20:39:00 2019 1 Tuesday 20 Part Two NaN 42.306714 -71.087418 (42.30671431, -71.08741801)
1 I192004093 3006 Medical Assistance SICK/INJURED/MEDICAL - PERSON E13 906 NaN 2019-01-15 21:18:00 2019 1 Tuesday 21 Part Three WALDEN ST 42.325610 -71.104500 (42.32561013, -71.10449956)
2 I192004088 619 Larceny LARCENY ALL OTHERS C11 402 NaN 2019-01-15 21:30:00 2019 1 Tuesday 21 Part One BURT ST 42.284135 -71.069574 (42.28413536, -71.06957385)
3 I192004086 3201 Property Lost PROPERTY - LOST C11 387 NaN 2019-01-15 20:48:00 2019 1 Tuesday 20 Part Three ADAMS ST 42.272306 -71.067214 (42.27230624, -71.06721386)
4 I192004085 3108 Fire Related Reports FIRE REPORT - HOUSE, BUILDING, ETC. D4 285 NaN 2019-01-15 20:29:00 2019 1 Tuesday 20 Part Three TREMONT ST 42.336409 -71.085650 (42.33640891, -71.08565039)
In [6]:
df['OFFENSE_CODE_GROUP'].value_counts().head(10)
Out[6]:
Motor Vehicle Accident Response    41416
Larceny                            29048
Medical Assistance                 26513
Investigate Person                 20717
Other                              20019
Drug Violation                     18358
Simple Assault                     17668
Vandalism                          17024
Verbal Disputes                    14660
Towed                              12558
Name: OFFENSE_CODE_GROUP, dtype: int64
In [7]:
sns.catplot(y='OFFENSE_CODE_GROUP',
           kind='count',
            height=8, 
            aspect=1.5,
            order=df['OFFENSE_CODE_GROUP'].value_counts().head(10).index,
           data=df);

We will only analyze years 2016, 2017 and 2018 for the following reasons:

  • Events for the year 2015 start in the month of June (Only half a year).
  • Only only have two weeks for the year 2019.
  • Only focusing on years 2016-2018 will give us a better perspective and will avoid misconceptions.

*Please keep in mind the visualizations below will display event counts for all three years, unless otherwise specified.

In [12]:
sns.set(rc={'figure.figsize':(15,6)})
sns.countplot(x='OFFENSE_CODE_GROUP',data=auto, order=auto['OFFENSE_CODE_GROUP'].value_counts().index)
plt.xticks(rotation=45)
plt.ylabel('No of Incidents')
plt.xlabel("");
plt.title("Incident Categories", size=35)
plt.show()

Though most of the police responses are listed as "Motor Vehicle Accident Response", many others resulted in vehicles being towed. Larceny and Auto Theft are closely related, if combined, these would be second in the list.

Let's look at the incidents per hour in the day.

In [13]:
sns.catplot(x='HOUR',
           kind='count',
            height=8.27, 
            aspect=3,
            color='lightblue',
           data=auto)
plt.xticks(size=20)
plt.yticks(size=20)
plt.xlabel('Hour', fontsize=30);
plt.ylabel('Count', fontsize=30);
plt.title("Incidents per Hour in the Day", size=50);

Morning and afternoon rush hour times are where most of the incidents take place.

Let's divide these by category to get a better sense of the incident events.

In [14]:
auto.groupby([auto['OCCURRED_ON_DATE'].dt.hour,'OFFENSE_CODE_GROUP',])['INCIDENT_NUMBER'].count().unstack().plot(marker='o', figsize=(15,10))
plt.ylabel('No of Incidents');
plt.xlabel('Hour of the day');
plt.legend(fontsize="x-large");
plt.xticks(np.arange(24));
plt.title("Incidents per Hour in the Day", size=40);

If you are in Boston, it is most likely to have a motor vehicle accident at around 5pm.

In [15]:
sns.catplot(x='DAY_OF_WEEK',
           kind='count',
            height=8, 
            aspect=3,
           data=auto)
plt.xticks(size=30)
plt.yticks(size=30)
plt.xlabel('');
plt.ylabel('Count', fontsize=40);
plt.title("Incidents per Day of the Week", size=50);

Friday appears to be the most volatile day, followed by Saturday of course.

How about per months in the year?

In [16]:
months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
sns.catplot(x='MONTH',
           kind='count',
            height=8, 
            aspect=3,
            color='lightblue',
           data=auto)
plt.xticks(np.arange(12), months, size=30)
plt.yticks(size=30)
plt.xlabel('');
plt.ylabel('Count', fontsize=40);
plt.title("Incidents per Months in the Year", size=50);

August is the month were most incidents take place.

In [17]:
months=['', 'Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
auto.groupby('MONTH')['INCIDENT_NUMBER'].count().plot(marker='o', color='red', linewidth=2, markersize=12, markerfacecolor='lightblue', figsize=(15, 5))
plt.xticks(np.arange(0,13, 1),months)
plt.ylabel('No of Incidents');
plt.title("Incidents per Months in the Year", size=40);
In [18]:
months=['','Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']

auto.groupby([auto['OCCURRED_ON_DATE'].dt.month,'OFFENSE_CODE_GROUP',])['INCIDENT_NUMBER'].count().unstack().plot(marker='o', figsize=(15,10));
plt.ylabel('No of Incidents');
plt.legend(fontsize="x-large");
plt.xlabel("")
plt.xticks(np.arange(0, 13, 1),months);
plt.title("Type of Incidents per Months in the Year", size=37);

What if we look at every year separately?

Notice that 2017 had a greater number of incidents than both 2016 and 2018.

The only month were the number of incidents was higher in 2018, relative to 2017, were June and November.

In [19]:
auto.groupby(['MONTH','YEAR'])['INCIDENT_NUMBER'].count().unstack().plot(kind='bar', figsize=(15, 6));
plt.ylabel('No of Incidents');
plt.xlabel("Month of the year");
plt.legend(loc='center left', bbox_to_anchor=(1, .8), fontsize="x-large");
plt.title("Type of Incidents per Month by Year", size=37);

When shootings took place, the majority of incidents resulted in towings.

In [21]:
sns.set(rc={'figure.figsize':(10,6)})
sns.countplot(x='OFFENSE_CODE_GROUP',data=auto[auto.SHOOTING== "Y"], order=auto['OFFENSE_CODE_GROUP'].value_counts().index)
plt.xticks(rotation=45)
plt.title('No of Incidents where there were shootings')
plt.show()

We will plot these in the map to have a better feel for the geographical data.

This Heatmap serves as a "Heat-Check".

In [23]:
# Create basic Folium crime map
crime_map = folium.Map(location=[42.3125,-71.0875], 
                      zoom_start = 11)

# Add data for heatmp 
auto_heatmap = auto[['Lat','Long']]
auto_heatmap = auto.dropna(axis=0, subset=['Lat','Long'])
auto_heatmap = [[row['Lat'],row['Long']] for index, row in auto_heatmap.iterrows()]
HeatMap(auto_heatmap[:50000], radius=10).add_to(crime_map)

# Plot!
crime_map
Out[23]:

We can see that these are happening all over the city!

In [24]:
# These are the last 2000 incidents. Looks like these are all over the city. 
map = folium.Map(width=800,
                 height=500,
                 location=[42.33, -71.070],
                 zoom_start=12)
count=0
for i in range(0,len(auto)):
    try:
        folium.Marker([auto.iloc[i]['Lat'], auto.iloc[i]['Long']], popup=auto.iloc[i]['STREET']).add_to(map)
    except:
        pass
    count +=1
    if count > 2000:
        break
map
Out[24]:

These are the 5 coordinates you want to stay away from in Boston!

In [27]:
# The furthest you stay away from these coordinates, the safer!
map = folium.Map(width=800,height=500,location=[42.33, -71.070], zoom_start=12)

folium.Marker([42.32696647, -71.06198607]).add_to(map)
folium.Marker([42.33152148, -71.07085307]).add_to(map)
folium.Marker([42.36067984, -71.05482325]).add_to(map)
folium.Marker([42.32809966, -71.06321676]).add_to(map)
folium.Marker([42.36183857, -71.05976489]).add_to(map)
map
Out[27]:

Main Findings

  • Though most of the police responses are listed as "Motor Vehicle Accident Response", many others resulted in vehicles being towed. Larceny and Auto Theft are closely related, if combined, these would be second in the list.
  • Morning and afternoon rush hour times are where most of the incidents take place.
  • If you are in Boston, it is most likely to have a motor vehicle accident at around 5pm.
  • Friday appears to be the most volatile day, followed by Saturday of course.
  • August is the month were most incidents take place.
  • 2017 had a greater number of incidents than both 2016 and 2018.
  • The only month were the number of incidents was higher in 2018, relative to 2017, were June and November.
  • When shootings took place, the majority of incidents resulted in towings.
  • Motor Vehicle related incidents occur all over the city and not just in a specific area.